Model Selection

Low-resource deployment

# Low-resource deployment

Nvidia OpenReasoning Nemotron 32B GGUF

A quantized version of NVIDIA OpenReasoning - Nemotron - 32B, quantized through llama.cpp to reduce model storage and computational resource requirements for easy deployment.

Large Language Model

Nvidia OpenReasoning Nemotron 1.5B GGUF

The quantized version of NVIDIA OpenReasoning - Nemotron - 1.5B, optimized by the llama.cpp tool to improve the running efficiency and performance on different hardware.

Large Language Model

Openreasoning Nemotron 32B Q4 K M GGUF

This model is a GGUF format model converted from nvidia/OpenReasoning-Nemotron-32B and can be used with llama.cpp.

Large Language Model

Transformers Supports Multiple Languages

Thedrummer Cydonia 24B V4 GGUF

A quantized version of TheDrummer's Cydonia-24B-v4 model based on llama.cpp, which can run efficiently on devices with limited resources.

Large Language Model

Voxtral Mini 3B 2507 Transformers

Voxtral Mini is an enhanced version based on Ministral 3B, with advanced audio input capabilities and excellent performance in speech transcription, translation, and audio understanding.

Transformers Supports Multiple Languages

Kyutai Helium 1 2b GGUF

A GGUF format model file based on kyutai/helium-1-2b, quantized by TensorBlock, supporting multiple languages.

Large Language Model

Transformers Supports Multiple Languages

LFM2 1.2B MLX Bf16

LFM2-1.2B is a multilingual text generation model with a parameter scale of 1.2B launched by LiquidAI, optimized for Apple Silicon chips.

Large Language Model

Transformers Supports Multiple Languages

lmstudio-community

Wr30a Deep 7B 0711 I1 GGUF

This is a quantized version of the prithivMLmods/WR30a-Deep-7B-0711 model, supporting multiple languages and suitable for various tasks such as text generation and image caption generation.

Transformers Supports Multiple Languages

Huihui Gemma 3n E2B It Abliterated GGUF

A statically quantized version of the Gemma-3n-E2B-it model, supporting various speech and text processing tasks

Large Language Model

Transformers English

Diffucoder 7B Cpgrpo 8bit

DiffuCoder-7B-cpGRPO-8bit is a code generation model converted to MLX format, based on apple/DiffuCoder-7B-cpGRPO, and is specifically designed to provide developers with an efficient code generation tool.

Large Language Model Other

Unireason Qwen3 14B RL GGUF

A static quantization version of UniReason-Qwen3-14B-RL, suitable for text generation and mathematical reasoning research scenarios.

Large Language Model

Transformers English

Gemma 3n E2B GGUF

A static quantized version of the Google Gemma-3n-E2B model, offering various quantization types to balance model size and performance.

Large Language Model

Transformers English

Gemma 3n E4B It MLX Bf16

Gemma-3n-E4B-it is a model developed by Google, optimized through MLX quantization, and is particularly suitable for Apple Silicon devices.

Large Language Model

lmstudio-community

Delta Vector Austral 70B Winton GGUF

This is a quantized version of the Austral-70B-Winton model by Delta-Vector. Through quantization technology, it reduces the storage and computational resource requirements of the model while maintaining good performance, making it suitable for scenarios with limited resources.

Large Language Model English

Gama 12b I1 GGUF

A quantized version of Gama-12B, providing files of various quantization types, suitable for text generation tasks and supporting English and Portuguese.

Large Language Model

Transformers Supports Multiple Languages

Gama-12B is a large language model supporting multiple languages, offering various quantized versions to meet different performance and precision requirements.

Large Language Model

Transformers Supports Multiple Languages

Longwriter Zero 32B I1 GGUF

The LongWriter-Zero-32B quantized model is based on the THU-KEG/LongWriter-Zero-32B base model, supports both Chinese and English, and is suitable for long context scenarios such as reinforcement learning and writing.

Large Language Model

Transformers Supports Multiple Languages

Skywork Skywork SWE 32B GGUF

Skywork-SWE-32B is a large language model with 32B parameters. It is quantized by Llamacpp imatrix and can run efficiently in resource-constrained environments.

Large Language Model

Nvidia AceReason Nemotron 1.1 7B GGUF

This is a quantized version of the NVIDIA AceReason - Nemotron - 1.1 - 7B model, which optimizes the model's running efficiency on different hardware while maintaining certain performance and quality.

Large Language Model Supports Multiple Languages

Openbuddy OpenBuddy R1 0528 Distill Qwen3 32B Preview0 QAT GGUF

This is the quantized version of OpenBuddy-R1-0528-Distill-Qwen3-32B-Preview0-QAT. With quantization technology, the model can run more efficiently under different hardware conditions.

Large Language Model Supports Multiple Languages

Qwen3 Embedding 0.6B Onnx Uint8

This is a quantized model based on ONNX, which is the uint8 quantized version of Qwen/Qwen3-Embedding-0.6B. It reduces the model size while maintaining retrieval performance.

Wan2.1 T2V 14B FusionX VACE GGUF

This is a text-to-video quantization model that undergoes quantization conversion based on a specific base model and supports various video generation tasks.

Text-to-Video English

Wan2.1 T2V 14B FusionX GGUF

This is a quantized text-to-video model that converts the base model to the GGUF format and can be used in ComfyUI, providing more options for text-to-video generation.

Text-to-Video English

Deepseek R1 0528 Qwen3 8B 6bit

A 6-bit quantized version converted from the DeepSeek-R1-0528-Qwen3-8B model, suitable for text generation tasks in the MLX framework.

Large Language Model

Blitzar Coder 4B F.1 GGUF

Blitzar-Coder-4B-F.1 is an efficient multilingual coding model fine-tuned based on Qwen3-4B, supporting more than 10 programming languages and having excellent code generation, debugging, and reasoning capabilities.

Large Language Model

Home Llama 3.2 3B

Home Llama 3.2 3B is fine-tuned based on Meta's Llama 3.2 3B model and is specifically designed for controlling home devices and performing basic Q&A tasks.

Large Language Model

Safetensors Supports Multiple Languages

Echelon AI Med Qwen2 7B GGUF

This project provides the GGUF quantized file for the Echelon-AI/Med-Qwen2-7B model, supported by Featherless AI, aiming to enhance model performance and reduce operating costs.

Large Language Model

featherless-ai-quants

Gemma 3n E4B It

Gemma 3n is a lightweight and state-of-the-art open-source multimodal model family launched by Google. It is built on the same research and technology as the Gemini model and supports text, audio, and visual inputs.

Bielik 11B V2.6 Instruct GGUF

Bielik-11B-v2.6-Instruct is a large Polish language model developed by SpeakLeash and ACK Cyfronet AGH, fine-tuned based on Bielik-11B-v2, suitable for instruction following tasks.

Large Language Model

Phi 3.5 Mini Instruct

Phi-3.5-mini-instruct is a lightweight and advanced open-source model built on the dataset used by Phi-3, focusing on high-quality, inference-rich data. It supports a 128K token context length and has powerful multilingual and long-context processing capabilities.

Large Language Model

Transformers Other

Deepseek R1 0528 GGUF

A quantized model based on DeepSeek-R1-0528, focusing on text generation tasks and providing a more efficient way of use.

Large Language Model

lmstudio-community

Infly Inf O1 Pi0 GGUF

A quantized version based on the infly/inf-o1-pi0 model, supporting multilingual text generation tasks, optimized with llama.cpp's imatrix quantization.

Large Language Model Supports Multiple Languages

Medgemma 4b It GGUF

medgemma-4b-it is a multimodal model focused on the medical field, capable of processing image and text inputs, and suitable for multiple medical scenarios such as radiology and clinical reasoning.

Devstral Small 2505 4bit DWQ

This is a 4-bit quantized language model in MLX format, suitable for text generation tasks.

Large Language Model Supports Multiple Languages

Facebook KernelLLM GGUF

KernelLLM is a large language model developed by Facebook. This version is quantized using the llama.cpp tool with imatrix, offering multiple quantization options to suit different hardware requirements.

Large Language Model

Verireason Qwen2.5 1.5B Grpo Small GGUF

This is the statically quantized version of the Nellyw888/VeriReason-Qwen2.5-1.5B-grpo-small model, focusing on Verilog code generation and reasoning tasks.

Large Language Model English

A M Team AM Thinking V1 GGUF

Llamacpp imatrix quantized version based on a-m-team/AM-Thinking-v1 model, supporting multiple quantization types, suitable for text generation tasks.

Large Language Model

Qwen3 0.6B Llamafile

Qwen3 is the latest generation of large language models in the Qwen series, offering a dense model with 0.6B parameters, achieving breakthrough progress in reasoning, instruction following, agent capabilities, and multilingual support.

Large Language Model

Thedrummer Rivermind Lux 12B V1 GGUF

This is a 12B-parameter large language model, processed with llama.cpp's imatrix quantization, offering multiple quantized versions to accommodate different hardware requirements.

Large Language Model

Gemma 3 4b It 4bit DWQ

A 4-bit DWQ quantized MLX format version converted from the Google Gemma-3-4b-it model, providing efficient text generation capabilities

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase